Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 1022420190110020057
Phonetics and Speech Sciences
2019 Volume.11 No. 2 p.57 ~ p.64
Performance comparison of various deep neural network architectures using Merlin toolkit for a Korean TTS system
Hong Jun-Young

Kwon Chul-Hhong
Abstract
In this paper, we construct a Korean text-to-speech system using the Merlin toolkit which is an open source system for speech synthesis. In the text-to-speech system, the HMM-based statistical parametric speech synthesis method is widely used, but it is known that the quality of synthesized speech is degraded due to limitations of the acoustic modeling scheme that includes context factors. In this paper, we propose an acoustic modeling architecture that uses deep neural network technique, which shows excellent performance in various fields. Fully connected deep feedforward neural network (DNN), recurrent neural network (RNN), gated recurrent unit (GRU), long short-term memory (LSTM), bidirectional LSTM (BLSTM) are included in the architecture. Experimental results have shown that the performance is improved by including sequence modeling in the architecture, and the architecture with LSTM or BLSTM shows the best performance. It has been also found that inclusion of delta and delta-delta components in the acoustic feature parameters is advantageous for performance improvement.
KEYWORD
deep neural networks, Merlin toolkit, text-to-speech (TTS)
FullTexts / Linksout information
 
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)